Skip to content

Conversation

ianayl
Copy link
Contributor

@ianayl ianayl commented Oct 9, 2025

This PR:

  • moves benchmarking CI data to benchmark-ci-tests branch instead of intel/llvm-ci-perf-results
  • removes the need for an additional bot user when pushing data (as well as the need to periodically update its tokens)

Test run: https://github.com/intel/llvm/actions/runs/18386952310/job/52388325821 (job failed due to "regression"; issue should be addressed with the merge of #20277

@ianayl ianayl requested a review from a team as a code owner October 9, 2025 20:37
@ianayl ianayl requested a review from a team October 9, 2025 20:37
echo "PRESET=$PRESET" >> $GITHUB_ENV
# Set branch containing benchmark CI results:
echo "BENCHMARK_RESULTS_BRANCH=sycl-benchmark-ci-results" >> $GITHUB_ENV
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

perhaps just set it as an env (in the job scope, or top-level)...?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there another way to set an env variable? I thought this is how you do it

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The way you did it I usually use when I want to dynamically set something within the workflow.

If more of a static var is required, I usually use:

env:
  BUILD_DIR : "${{github.workspace}}/build"
  INSTALL_DIR: "${{github.workspace}}/build/install"

e.g. in UMF workflow:
https://github.com/oneapi-src/unified-memory-framework/blob/main/.github/workflows/nightly.yml#L14-L16

it will work in both "job" and "global" scope

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't this defining the env variables for a single step only? i.e. variables defined this way do not propagate across steps

I use the variable again in actions/checkout call here

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

as I mentioned, you can set it up in "job" and/or "global" scope - see my example above set up for global scope

I think, setting it up for step scope is only possible if you use with clause (e.g. for running reusable workflow or an action)

# Update benchmarking dashboard configuration
cat << EOF > benchmarks/config.js
remoteDataUrl = 'https://raw.githubusercontent.com/intel/llvm-ci-perf-results/refs/heads/unify-ci/';
remoteDataUrl = 'https://raw.githubusercontent.com/intel/llvm/refs/heads/sycl-benchmark-ci-results/';
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

my note on this change - while it seems convenient, I can see on my PC, that repo intel/llvm-ci-perf-results is ~600MB - this will be extra MBs added to intel/llvm repo, which is already quite big - just to consider

Copy link
Contributor

@uditagarwal97 uditagarwal97 Oct 14, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ianayl would it be possible to compress the benchmarking data before pushing it to intel/llvm ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In theory yes, but only the archived data. I think we'll need to have a conversation on this: it might even be reasonable to delete horribly outdated data.

We could also try moving archived data to intel/llvm-ci-perf-results, but alas that'll mean we need to keep an updated bot user token again for the repository.

Do github repositories have a max storage quota?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do github repositories have a max storage quota?

https://docs.github.com/en/repositories/creating-and-managing-repositories/repository-limits#repository-size

we'll need to have a conversation on this

I think we should have that conversation before merging this PR. Currently, the size of intel/llvm-ci-perf-results is ~600MB, but, if unchecked the size will increase further.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In theory yes, but only the archived data. I think we'll need to have a conversation on this: it might even be reasonable to delete horribly outdated data.

Compressing results files gives ~190MB out of 320M, so I think it is better to just remove outdated data. We have data starting at March'25.

How many months of data do wee need to keep. 6? In this case we will have data staring from 15th April today. The rest will be in git history, hopefully well compressed.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could also try moving archived data to intel/llvm-ci-perf-results, but alas that'll mean we need to keep an updated bot user token again for the repository.

If we go this way, we can archive e.g. once every second week, or something, and make a PR instead of pushing

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants